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ABSTRACT 

We revisit the problem of exact CMB likelihood and power spectrum estimation with the goal of 
minimizing computational cost through linear compression. This idea was originally proposed for 
CMB purposes by Tegmark et al. (1997), and here we develop it into a fully working computational 
framework for large-scale polarization analysis, adopting WMAP as a worked example. We compare 
hve different linear bases (pixel space, harmonic space, noise covariance eigenvectors, signal-to-noise 
covariance eigenvectors and signal-plus-noise covariance eigenvectors) in terms of compression effi¬ 
ciency, and find that the computationally most efficient basis is the signal-to-noise eigenvector basis, 
which is closely related to the Karhunen-Loeve and Principal Component transforms, in agreement 
with previous suggestions. Eor this basis, the information in 6836 unmasked WMAP sky map pixels 
can be compressed into a smaller set of 3102 modes, with a maximum error increase of any single 
multipole of 3.8% ai i < 32, and a maximum shift in the mean values of a joint distribution of an 
amplitude-tilt model of O.OObtr. This compression reduces the computational cost of a single likelihood 
evaluation by a factor of 5, from 38 to 7.5 CPU seconds, and it also results in a more robust likelihood 
by implicitly regularizing nearly degenerate modes. Finally, we use the same compression framework 
to formulate a numerically stable and computationally efficient variation of the Quadratic Maximum 
Likelihood implementation that requires less than 3 GB of memory and 2 CPU minutes per iteration 
for £ < 32, rendering low-£ QML CMB power spectrum analysis fully tractable on a standard laptop. 
Subject headings: cosmic microwave background — cosmology: observations — methods: statistical 


1. INTRODUCTION 

Through a series of increasingly sensitive experiments 
measuring the cosmic microwave background (CMB), led 
by COBE (Mather et al. 1990), WMAP (Bennett et al. 
2013) and Planck (Planck Collaboration 2014)), cosmol- 
ogists have during the last two decades established a suc¬ 
cessful cosmological concordance model (Planck Collab¬ 
oration 2014). According to this model, the universe 
is isotropic and homogeneous, and filled with Gaussian 
random fluctuations drawn from a nearly scale-invariant 
primordial power spectrum; its energy budget is made up 
by 68% dark energy, 27% dark matter and 5% baryonic 
matter. Remarkably, only six or seven parameters are 
required to model accurately millions of data points. 

The connection between those millions of data points 
and the handful of cosmological parameters is made 
through the so-called likelihood function, and cosmolog¬ 
ical parameter estimation essentially amounts to map¬ 
ping out this function, for instance using Markov Chain 
Monte Carlo (Lewis & Bridle 2002), multi-dimensional 
gridding (Mikkelsen et al. 2013) or non-linear optimiza- 
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tion (Planck Collaboration 2014). Since the CMB fluc¬ 
tuations are observed to be (at least close to) Gaussian 
distributed, the analytic expression for the likelihood is 
formally given by a multivariate Gaussian. However, this 
expression is of limited practical use for modern CMB 
experiments, because of the high dimensionality of the 
associated covariance matrix. For Planck, the number 
of pixels is A^pix ~ 5 x 10^, and since brute-force like¬ 
lihood evaluation requires a Cholesky decomposition of 
this matrix, computationally scaling as 0{Np^^), a single 
evaluation would cost CPU years and require and 
require 10^ TB RAM (see, e.g., Borrill 1999, for a related 
discussion). 

Obviously, the direct brute-force likelihood approach is 
not feasible for modern full-sky CMB experiments, and 
a few alternative methods have therefore been proposed 
and implemented in the literature. These can largely be 
broken into two groups. First, the most widely adopted 
approach is that of a hybrid likelihood, which simply 
splits the full likelihood into two components according 
to angular scales. Large angular scales are analyzed us¬ 
ing some exact method that fully accounts for the non- 
Gaussian nature of the likelihood, whereas small angu¬ 
lar scales are analyzed using faster likelihood approxi¬ 
mations motivated by the Central Value theorem. Usu¬ 
ally, the two likelihoods are sufficiently uncorrelated that 
they may be joined into a single all-scale expression ei¬ 
ther by straight multiplication or by explicitly account¬ 
ing for overlap correlations (Gjerlpw et al. 2013). The 
second group of methods may be characterized as sam¬ 
plers, as for instance implemented through Gibbs sam¬ 
pling (Jewell et al. 2004; Wandelt et al. 2004; Eriksen et 
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al. 2004), which draws samples from the CMB posterior. 
The computational scaling of this approach is 0{N^Q), 
and therefore computationally feasible even for high res¬ 
olution data. However, further work is required for this 
potential to be fully realized, as the computational ex¬ 
pense is still considerable (Seljebotn et al. 2014), and in 
practice samplers are still mostly used on large and in¬ 
termediate angular scales (Planck Collaboration 2014). 

Although one could argue that the ever-advancing 
progress of computer technology lessens the need for 
clever likelihood approximations, one could at the same 
time argue that reducing the time needed to perform 
likelihood evaluations allows us to expand our field of in¬ 
terest. As an example, there is little to gain in terms of 
computational time if we restrict our interest to the stan¬ 
dard 6-parameter AC DM model, which can presently be 
tackled by a standard laptop in a comfortable time frame. 
Presently, however, considering extensions to this model 
is gaining more and more interest. Such extensions ex¬ 
pand the parameter space of interest, which reintroduces 
the need to make our likelihood evaluations as fast as 
possible while still being reasonably accurate. 

In this paper we revisit the problem of exact brute- 
force likelihood evaluation on large angular scales, ex¬ 
ploiting the ideas initially introduced for CMB anal¬ 
ysis purposes by Tegmark et al. (1997) to reduce the 
computational cost through linear compression. Rather 
than crudely downgrading the data in pixel space un¬ 
til the computational costs are acceptable, we compress 
the data into a lower-dimensional basis set using a more 
general linear transformation, thereby reducing compu¬ 
tational costs while retaining by far most of the impor¬ 
tant information. Further, we also show how this formal¬ 
ism naturally leads to a very efficient implementation of 
the Quadratic Maximum Likelihood (QML) power spec¬ 
trum estimator. 

2. BASIC DEFINITIONS 

In their most basic form, CMB observations may be 
modelled^ as a linear sum of a cosmological CMB signal, 
s, observed by some instrumental beam convolution op¬ 
erator, B, some set of foreground contaminants, f, and 
random noise, n, 

d = Bs + f -f n. (1) 

The signal and noise terms are usually both assumed to 
be Gaussian distributed with zero mean and covariances 
S = B (ss‘) B* and N = (nn‘), respectively, and the 
total data covariance matrix is C = S -|- N, neglecting 
the foreground term for the moment. 

In most cases, the CMB signal is assumed to be 
isotropic, and it is therefore particularly convenient to 
expand this component into spherical harmonics, 

^max ^ 

S = 'y ^ y ^ = Ys. (2) 

£—0 jn——^ 

Here we have defined Y to be a matrix listing all spheri¬ 
cal harmonics (both spin-0 for temperature and spin-2 for 

^ Bold lower case letters denote vectors, and bold capital letters 
matrices. In pixel basis, vectors consist of the Stokes I, Q and U 
parameters stacked sequentially, and matrices consist of 3 X 3 block 
matrices containing II, IQ, lU etc. 


polarization; see Zaldarriaga & Seljak (1997) for details) 
up to some maximum band limit, €max column-wise, and 
i to be a vector containing the spherical harmonics coef¬ 
ficients of s. We additionally define the symbol Y“^ to 
denote the inverse spherical harmonic transform, 

s= f Y|^sdH = Y-is, (3) 

J 477 

but emphasize that this is not a true inverse of Y, as nei¬ 
ther Y nor Y~^ is square, and any spherical pixelization 
introduce non-orthogonality between modes on small an¬ 
gular scales; we only use Y“^ for high-£ mode filtering 
in the combination P = YY~^ in this paper, for which 
exact orthogonality is not required. 

Under the assumption of statistical isotropy, the signal 
covariance takes on a particularly simple form in spher¬ 
ical harmonic space, and is given by the angular power 
spectrum, Cf. Assuming further that the instrumental 
beam is circularly symmetric and fully described by a set 
of Legendre coefficients, bi, and that any smoothing ef¬ 
fects from discrete pixelization may be described in terms 
of an effective pixel window function, pi, the harmonic 
space elements of the signal covariance matrix reads 

= bip^ (,S£rn^£'rn'} ~ ' (4) 

where we have for simplicity defined the power spectrum 
coefficient, Ct, to denote a 3 x 3 block incorporating all 
temperature and polarization auto- and cross-spectra. 
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From Equation 2 we see that the corresponding signal co- 
variance matrix in map domain simply reads S = YSY^, 
and it is easy to show that the entries of this matrix are 
given by the two-point correlation function. 

Properly accounting for the foreground term is a far 
more complicated problem, and an extensive literature 
has been written on this topic (e.g., Leach et al. 2008; 
Planck Collaboration 2015, and references therein). In 
this paper we limit ourselves to a very basic foreground 
model in which f may be described by a finite set of 
spatial templates, each known perfectly up to an overall 
amplitude, 

f = ^ oAi = Ta, (6) 

i 

where T is a matrix listing all templates column-wise, 
and a is a vector of template amplitudes. Accounting for 
such templates is most easily implemented by solving the 
normal equations for a, and redefining the data vector 
and data covariance matrix as follows, 

d ^ d - T (T*(S -I- N)-iT)~^ T*(S + N)-^ (7) 

N^N-t-aTT‘; (8) 

here a is a parameter that estimates the uncertainty in 
the template fit, and a oo corresponds to full projec¬ 
tion. However, from a numerical point of view it is more 
convenient to set a to a large numerical value, to avoid 
an otherwise singular covariance matrix. In this paper, 
we let T consist of the monopole and three dipoles, all 
normalized to a maximum of unity, and let a = 10^. 
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With the above data model, the data likelihood de¬ 
pends only on the angular power spectrum, and is given 
by a multivariate Gaussian, 

p-id'^(S(C^)-l-N)-id 

C{CAd) = P(d\Ct) CX- . , (9) 

V'|S(C'^) + N| 

where we have implicitly accounted for template 
marginalization by the redefinitions in Equations 7 and 
8. In principle, this expression can be used directly for 
CMB power spectrum or cosmological parameter esti¬ 
mation when coupled to some non-linear optimization or 
MCMC implementation. However, as already noted, this 
expression contains both a matrix inverse and a determi¬ 
nant, and therefore scales computationally as 
Direct likelihood evaluations are therefore computation¬ 
ally very expensive, and the main goal of this paper is to 
speed up this expression simply by reducing the effective 
number of pixels. 

3. THE 9-YEAR WMAP LOW-£ LIKELIHOOD 

For pedagogical purposes, we specialize the discussion 
in this paper to the \aw-^ WMAP likelihood®, as pre¬ 
sented by Hinshaw et al. (2013). However, we note that 
the same approach should be fully applicable to cor¬ 
responding Planck low-£ polarization observations once 
available. 

The 9-year WMAP low-£ likelihood function is imple¬ 
mented as a hybrid between a pure temperature likeli¬ 
hood using a Blackwell-Rao estimator (Chu et al. 2005), 
and a pure polarization brute-force likelihood, similar 
to that described in Equation 9. Correlations between 
the two are handled by explicitly decorrelating the tem¬ 
perature component from the Stokes Q and U maps, 
given some fixed estimate of the full-sky temperature 
sky map and Cj^ (Page et al. 2007). For computa¬ 
tional speed, the polarization data are degraded onto 
a very low-resolution grid, defined by the HEALPix® 
pixelization with a resolution parameter of Wide = 8. 
This pixelization has a pixel size of 7° x 7°, and sup¬ 
ports harmonic modes reliably only up to £max = 16, al¬ 
though the WMAP likelihood implementation formally 
includes modes up to £ = 23. After applying a Galactic 
mask removing contaminated pixels, a total of 1100 low- 
resolution polarization (Q and U) pixels are included in 
the likelihood. 

This approach leads to a fast and flexible low-£ likeli¬ 
hood. However, in the process several assumptions have 
been made, most notably that the temperature noise 
is fully negligible (enabling the temperature-polarization 
split), and that the full-sky temperature modes are well 
described by the WMAP ILC map. Neither of these as¬ 
sumptions are obvious (see, e.g., Finelli et al. 2013 for 
a relevant discussion), and in particular the assumption 
of no temperature noise has significant consequences in 
terms of the effective prior of the likelihood: An absolute 
mathematical requirement for any likelihood is that the 
total covariance matrix, S-f N, is positive definite, while 
a softer physical requirement is that the signal covariance 
S alone is positive definite. Enforcing these requirements 
consistently is not trivial with a split likelihood, and we 

® http: //lambda.gsfc.nasa.gov 

® http://healpix.jpl.nasa.gov 


will see in Section 7 that the WMAP likelihood has both 
nonphysical “holes” as a result of this, as well as a gener¬ 
ally complicated behaviour near the singularity regions. 

A second issue with the WMAP likelihood implemen¬ 
tation lies in its resource requirements. To accelerate 
likelihood evaluations, the WMAP code precomputes the 
Legendre polynomials for each pair of pixels, and thereby 
saving GPU time for building the signal covariance ma¬ 
trix. However, this is costly in terms of memory, and 
requires 1 GB RAM already at A^side = 8, which only 
supports I < 16. Doubling the resolution, in order to 
probe scales up to £ < 32 increases this requirement to 
33 GB, which is more than most computers can handle 
comfortably today. 

In this paper, we present a more direct implementa¬ 
tion of a low-£ WMAP likelihood that relies only on the 
brute-force likelihood expression in Equation 9. Both 
temperature and polarization sky maps are considered 
at a common resolution parameter of Wide = 16. Oth¬ 
erwise, we adopt data combinations that are as close as 
possible to those used for the official WMAP likelihood 
(Hinshaw et al. 2013). Specifically, for polarization we 
include only the foreground-reduced WMAP Ka, Q and 
V bands in the following, not the K-band, which is used 
for foreground cleaning, or the W-band, which is known 
to have worse correlated noise and/or systematics issues 
than the other frequencies. For the temperature compo¬ 
nent, we adopt the 9-year WMAP ILC map, smoothed 
to 10° FWHM. 

The individual foreground-reduced polarization fre¬ 
quency maps are co-added into a single “clean” CMB 
map by inverse noise variance weighting, 

( 10 ) 

where is the full covariance matrix for band f, which 
also take into account the additional noise contribution 
from the foreground reduction. The noise covariance of 
the co-added map reads 

N=(ENr') ■ (11) 

We finally add 2 /iK regularization noise to the ILC tem¬ 
perature map, to make the temperature covariance ma¬ 
trix invertible. The full noise covariance thus consists of 
a diagonal temperature block and a dense polarization 
block, with no cross-terms between the two. 

We adopt the WMAP KQ85 mask for the tempera¬ 
ture component, and the P06 mask for the polarization 
components (Bennett et al. 2013), leaving a total of 2326 
temperature and 4510 polarization (Q and f/) pixels for 
analysis, or a total of 6836 elements in the data vector. 
The instrumental beams are taken to be a perfect Gaus¬ 
sian of 10° FWHM for the temperature, and a Gaussian 
of 30.6 arcmin for polarization, corresponding roughly to 
the Q-band beam, and adopted as a rough average of 
the three channels; its impact is very small for the mul¬ 
tipoles considered in the following, having a minimum 
amplitude of hi = 0.993 at £ = 30. 

4. LINEAR COMPRESSION AND BASIS DEFINITIONS 

4.1. Basic formalism 
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Fig. 1.— Example basis vectors for temperature {left column) 
and polarization [right column) for each of the five basis sets con¬ 
sidered in this paper, computed from the 9-year HrMj4Pdata. In 
each case, the basis vector with the 30th highest eigenvalue is 
shown, and only the Stokes Q component is shown for polariza¬ 
tion; the Stokes U components look qualitatively similar. 

Any linear transformation of a set of Gaussian random 
variables results in another set of Gaussian random vari¬ 
ables. Consider therefore some linear combination on the 
form 

d = Pd, (12) 


where P is some N x A^pix transformation matrix with 
N < A^pix, and d is a transformed data vector. If d 
is a zero-mean Gaussian field with covariance C, then 
d will be a zero-mean Gaussian field with covariance 
C = PCP*. With the data model described above, 
the corresponding likelihood for these compressed data 
therefore reads 


C{Ci\A)oi 


g-id^(S(Cf)+N)-M 

V'|S(Q)+N^ 

g-iP^d'^(PS(Ci)P'^-l-PNP^)-ipd 


v'|PS(Q)P^ + PNP'r| 


(13) 


The interesting question is now whether there exists some 
transformation P that retains the relevant information in 
d with a smaller number of data points, N < Npi^- 
Before explicitly defining a set of candidate bases, it 
is useful to introduce some additional notation. First, 
since there are always parts of the sky that are unavail¬ 
able for cosmological CMB analysis due to foreground 
contamination from our own Galaxy, we introduce a 
pixel space masking operator, M, defined in terms of 




Fig. 2.— Eigenvalue spectra for all five bases defined in the text, 
shown both tor temperature [top) and polarization [bottom), as 
evaluated for the 9-year WMAP data, using no high-t truncation. 
The eigenvalues are normalized to the maximum value, and sorted 
according to decreasing values; increasing values from left to right 
indicate negative eigenvalues. 

an iVjnask X Apjx matrix that contains one row for each 
unmasked pixel, with the value 1 in the column corre¬ 
sponding to the pixel number; all other entries are zero. 
When applied to a full-sky data vectors, this operator 
simply picks out the unmasked pixels, leaving all values 
numerically unchanged. 

Second, we define a harmonic space truncation oper¬ 
ator, Vfi = MY^where only spherical har¬ 
monics up to some truncation multipole, it < ^max, are 
included in the spherical harmonics operator. This op¬ 
erator filters out any spherical harmonics above it^ eval¬ 
uated only over masked pixels; since only masked pix¬ 
els are included, the operator is not a sharp operator in 
multipole space, but rather corresponds to a pseudo-o^m 
projection operator with non-zero coupling to inultipoles 
above 4 (e.g., Hivon et al. 2002). 

Third, we define [A]e as the set of eigenvectors of A 
with a fractional eigenvalue larger than e relative to the 
maximum eigenvalue. That is, let V be the matrix con¬ 
taining the eigenvectors of A, and W be the diagonal 
matrix of eigenvalues, such that A = VWV*; then [AJ^ 
contains all columns of V with an eigenvalue larger than 
e-max(W). This operator removes modes with low eigen¬ 
values, which in turn will be used to eliminate modes 
with low signal-to-noise ratio. However, because of the 
very different signal amplitudes in temperature and po- 
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larization, we define two different eigenvalue thresholds, 
ex and ep, for temperature or polarization modes, and 
set the temperature-polarization cross-elements in A to 
zero before performing the eigenvalue decomposition. 

4.2. Basis sets 

With the above notation, we define five candidate bases 
to be considered for further analysis, 

Pi = M Pixel 

P 2 = [P/j]eM Harmonic 

P 3 = [P/tN“^Pjj]eM Inverse noise 

P 4 = [P/j(S -I- N)P^]eM Signal-plus-noise 

P 5 = |p,,(S1/2n-isi/ 2)P*^],M Signal-to-noise, 

where S = S(C'f‘^) is the signal covariance matrix com¬ 
puted from some fiducial model^°. Each basis is ei¬ 
ther commonly encountered in the literature (i.e., pixels, 
harmonics, signal-to-noise eigenmodes) or have a well- 
defined specific purpose (e.g., the inverse noise basis is 
particularly well suited to test systematics by suppressing 
poorly measured modes, while the signal-plus-noise basis 
corresponds to numerical regularization of the data co- 
variance matrix). It is also worth noting that the signal- 
to-noise basis is closely related to the Karhunen-Loeve 
(or Principal Component) transform originally proposed 
for cosmological applications by Tegmark et al. (1997). 
Potential dependence on the assumed fiducial spectrum, 
C^'^, is considered in Section 7; we find no significant 
detrimental effects by adopting a power spectrum far 
from the best-fit spectrum. 

There are two tunable parameters in this framework, 
and e, both of which have a very intuitive interpretation: 
Lowering £t removes high-.^ spherical harmonic modes, 
while increasing e removes low signal-to-noise modes. 
However, it is important to note that no choice of either 
it or e can ever bias the power spectrum, but only modify 
the uncertainties. Linear compression simply amounts to 
removing irrelevant modes, and is mathematically fully 
equivalent to removing masked pixels. However, lowering 
^max (as opposed to it) will both bias the power spectrum 
and increase the because it changes the data model, 
not simply the data selection. This is an important dif¬ 
ference between our approach and that implemented by 
the official WMAP polarization likelihood code, which 
simply downgrades the actual sky maps from fVside = 16 
to 8 . 

Before proceeding with basis optimization, it is useful 
to build some intuition about the various basis candi¬ 
dates. In Figure I we therefore show an example basis 
vector, and in Figure 2 we show the eigenvalue spectrum, 
for each basis as computed from the 9-year WMAP data 
(Section 3). The example basis vectors all correspond to 
the vector with the 30th largest eigenvalue, £ 30 , for both 
temperature and polarization. Only the Stokes Q field 
is shown for polarization, as Stokes U looks qualitatively 
similar. 

Starting with the pixel basis, we see in Figure 1 that 
each pixel corresponds in this case to an independent ba¬ 
sis vector. Furthermore, as seen in Figure 2, the eigen- 
spectrum is completely flat, and no truncation limit, e, 

We set when constructing the basis signal 

covariance matrix in the following analyses, to ensure good sam¬ 
pling of both spectra spectra. 


can remove any degrees of freedom. The pixel basis is 
therefore always complete, and all information stored in 
the uncompressed data is (by definition) retained with 
this basis. In the following, we adopt the pixel basis 
as the reference against which we measure data loss for 
other bases. 

The second row in Figure 1 shows a temperature and 
polarization mode of the spherical harmonics basis, and 
the dashed line in Figure 2 shows its eigenspectrum. 
Both of these highlight a problematic feature with this 
particular basis: It is susceptible to numerical errors at 
high multipoles. Ideally, should be identically 

equal to one for i < it and zero otherwise. However, 
because all operations are done on a finite pixelization 
that supports only a finite number of multipoles, there is 
always some leakage between multipoles in this operator. 
Furthermore, one is not guaranteed that / 
will be smaller than one. On the contrary, the worst- 
behaved modes often have a square-integral substantially 
larger than one. This is observed as three distinct re¬ 
gions in the eigenspectrum of the spherical harmonics 
basis: The flat plateau from about 50 to 1500 corre¬ 
sponds to well resolved modes with good support on the 
masked sky; the rapid decrease above 1500 corresponds 
to modes that are filtered either by the high-^ truncation 
operator or are degenerate because of the sky mask^^, 
and, finally, the modes below 50 are numerically unsta¬ 
ble high-£ modes that have an eigenvalue larger than 1 . 
The harmonic mode shown in Figure 1 is an example of 
such a mode. 

The third basis corresponds to the eigenvectors of the 
inverse noise covariance matrix. For the low-resolution 
WMAP data, this matrix is given by a spatially constant 
regularization noise RMS amplitude of 2^K for temper¬ 
ature, and the actually measured instrumental noise co- 
variance for polarization, including both scanning strat¬ 
egy and correlated noise effects. For temperature, the 
inverse noise basis functions are therefore identical to 
the pixel basis, with one pixel per value, with one excep¬ 
tion: This basis explicitly highlights the effect of fore¬ 
ground template projection in the form of a sharp drop 
in the eigenspectrum, corresponding to the monopole and 
dipoles modes that are manually assigned a numerically 
large uncertainty. For polarization, the dominant fea¬ 
ture is the scanning strategy, which is clearly seen in 
the example basis mode in Figure 1. This basis may be 
useful for systematics studies, since instrumental system¬ 
atics are often strongly associated with poorly measured 
modes. 

The fourth basis is defined as the eigenvectors of the 
total data covariance matrix, S -|- N. This could be a rel¬ 
evant basis for cases that have an ill conditioned covari¬ 
ance matrix, as for instance often happens for strongly 
signal-dominated temperature data, S >> N. Since S 
by construction is spanned by (£max + 1 )^ < iVpix modes, 
this situation leads to a poorly conditioned total covari¬ 
ance matrix that needs to be regularized before further 
analysis. The two most common approaches are either 
to add a small amount of white noise to the data (known 
as “regularization noise”) or to increase ^max beyond the 

Note that the absolute value of the eigenvalue is plotted here; 
the sharp feature indicates the mode for which the eigenvalue be¬ 
comes negative. 



Fig. 3. — Regularizing the likelihood with a condition number 
prior. Dashed curves show slices through while fixing all 

other multipoles at their maximum-likelihood points. Solid lines 
show the condition number of the data covariance matrix, S + N, 
as a function of power spectrum. Thick curves show results with no 
prior on the condition number, and thin curves show results when 
requiring the condition number to be smaller than 50 000. Condi¬ 
tion number regularization eliminates likelihood artifacts near the 
singularity boundary. 

Nyquist limit formally supported by the pixelization. A 
third option would be to use the S + N basis proposed 
here, which simply removes by hand poorly conditioned 
modes from the data set. 

Finally, the fifth basis is given by the eigenvectors 
of the signal-to-noise covariance matrix, 
written in an explicitly symmetric form to minimize nu¬ 
merical errors. In this case, a prior spectrum is intro¬ 
duced that allows one to select modes based on individual 
signal-to-noise ratio, only retaining those that actually 
contribute with useful information. Several variations of 
this has already been discussed extensively in the liter¬ 
ature, resulting in various implementations of the same 
underlying ideas, two of which are the Karhunen-Loeve 
and Principal Component transforms. This was also the 
basis set originally proposed by Tegmark (1997). For a 
related application to non-Gaussianity, see Rocha et al. 
( 2001 ). 

4.3. A condition number based prior 

Before proceeding to further analysis, we make a mi¬ 
nor comment on a technical issue already mentioned in 
Section 3, namely that the total data covariance ma¬ 
trix, C must be positive definite in order for the like¬ 
lihood to be well defined. Intuitively and not mathemat¬ 
ically rigorously, this condition breaks down whenever 
Sim,tm < —Nirn,im- However, the likelihood surface ac¬ 
tually become unstable well before this limit because of 
numerical errors, as illustrated in Figure 3. The only 
difference between the main frame and the inset is the 
x-axis range. First, the dashed thick line shows an (ar¬ 
bitrarily normalized) slice through our pixel basis likeli¬ 
hood for C{C 2 ^), keeping all other multipoles fixed at 
their maximum-likelihood values. This slice exhibits per¬ 
fectly normal behaviour for large values of follow¬ 

ing roughly the behaviour of an inverse Gamma distri¬ 
bution. However, near the value of = —0.0175/iK^ 


the likelihood rapidly increases, and essentially diverges 
to infinity. 

This behaviour is a generic feature of any likelihood 
near the boundary at which it becomes singular: Even 
if the matrix may be positive definite and invertible, the 
numerical value cannot be trusted sufficiently near the 
singularity boundary. Fortunately, this problem can be 
resolved in several ways, and our preferred solution is by 
monitoring the covariance matrix condition number, i.e., 
the ratio of the largest to smallest eigenvalue. This quan¬ 
tity is shown as a solid thick line in Figure 3 for the above 
case. For any power spectrum value not close to the sin¬ 
gularity boundary, we see that the condition number is 
highly stable, with a numerical value around 15 000 for 
this particular case. However, once the covariance matrix 
approaches singularity, it starts to increase rapidly, and it 
does so sooner than the actual likelihood. The combina¬ 
tion of a high degree of stability within the main param¬ 
eter volume, and rapid increase toward the edges makes 
the condition number an effective monitor of the likeli¬ 
hood robustness. Therefore, rather than requiring that 
the covariance matrix simply must be positive definite (as 
for instance enforced by the official WMAP likelihood), 
we demand that the condition number must be smaller 
than some pre-defined threshold. The specific value of 
this threshold must be determined by some initial likeli¬ 
hood scans, but in practice this is very straightforward. 
For the above basis, we adopt a numerical threshold of 
50000, and the resulting regularized likelihood is shown 
as a dashed thin line. The slight difference with respect 
to the unregularized likelihood at higher values is due 
to the slightly different maximum-likelihood power spec¬ 
trum coefhcients at other multipoles caused by the same 
prior. 

5. EFFICIENT AND STABLE QML IMPLEMENTATION 

The formalism described in Section 4 can be used 
to derive a computationally efficient variation of the 
Quadratic Maximum Likelihood (QML) estimator, ini¬ 
tially introduced by Tegmark (1997) and Bond et al. 
(1998) as an efficient route to the maximum-likelihood 
CMB power spectrum. For example applications, see, 
e.g., Gruppuso et al. (2009, 2011, 2013) and references 
therein. Let C_b = dC/dCb denote the derivative of 
the data covariance matrix with respect to some power 
spectrum parameter, Cb, with b = {£i,... ,in}, where n 
denotes the number of multipoles included in the spec¬ 
tral bin. The first derivative and Fisher matrix of the 
log-likelihood may then be written as 

^ = itr [(dd* - C)(C-iC,,C-i)] (14) 

Fbb^ = hr ; (15) 

see Section HC of Bond et al. (1998) for full details. 

The QML estimator is now defined as follows: 

1. Make some initial guess at the power spectrum. 



2. Update the spectrum according to the following 
rule, 

Cf =C<‘-‘'+pF-'V^ ( 16 ) 
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3. Iterate until convergence 

This algorithm is closely related to the Newton-Raphson 
optimization method, with the one difference that it em¬ 
ploys the (computationally cheaper) Fisher matrix in¬ 
stead of the curvature matrix. The two algorithms con¬ 
verge to the same (maximum-likelihood) solution (Bond 
et al. 1998). 

In this paper, we note that Equations 14 and 15 can be 
slightly rewritten to facilitate fast numerical evaluation. 
Specifically, the signal matrix may be written as C = 
PYSYtp* -b N, where S is the full-sky signal covariance 
matrix in harmonic space, and all geometry and data 
selection effects are encoded in the constant projection 
operators P and Y. The derivative of this matrix with 
respect to Cb reads 

8C 

— = PYIfcYtp‘, (17) 

where If, is a harmonic space matrix containing the value 
1 for entries containing in S for I' € 6, and otherwise 
0 ; it is very sparse, and multiplication with this matrix 
is fast. 

Inserting this expression into Equation 14 and 15, and 
noting that the trace operator is invariant under cyclic 
permutations, we see that 

^ = -tr [(Ytp‘C-i)(dd‘ - C)(C-iPY)I,] (18) 

Fbb' = itr [(Ytp*C-ipY)Ifa(Y+P*C-ipY)Ib,] . 

(19) 

While these expressions look somewhat formidable at 
first sight, they are in fact computationally very efficient. 
Starting with the first derivative, the important point 
is that all multipole dependencies have been factorized 
away from expensive dense matrix products. After pre- 
computing PY (which only has to be done once for every 
basis set) and grouping the matrix products as indicated 
with parentheses in the above equations, the computa¬ 
tional cost of the first derivative is given by only two 
matrix products plus one Cholesky factorization/solve, 
and the total memory consumption is equivalent to four 
dense matrices. The memory consumption is indepen¬ 
dent of the number of power spectrum bins, and the 
CPU time is only weakly dependent on the number of 
bins, involving only a single sparse trace evaluation. 

A similar consideration holds for the Fisher matrix. In 
this case, the main computational cost lies in evaluating 
YtptC"ipY once, at the cost of one Cholesky factor¬ 
ization/solve and one matrix multiplication. Computing 
the remaining product and traces is computationally fast, 
because of the high sparsity of the If, operator. 

The iterative QML algorithm as described above has 
one major weakness: The power spectrum proposed in 
iteration i does not necessarily yield a positive definite 
total data covariance matrix, C. This typically hap¬ 
pens whenever one or more likelihood conditionals have 
a sharp edge beyond which (symbolically) Sim < —Nim, 
which is not uncommon in the noise dominated regime 
(see Section 7 for explicit examples). 

As a safe-guard again this problem, we modify the 
QML algorithm as follows: 



Fig. 4.— Number of modes required for the Fisher uncertainty to 
increase by no more than 10% relative to the pixel basis, including 
modes between 2 < £ < 32 and plotted as a function of truncation 
multipole. 

1. Make some initial guess at the power spectrum. 



2. Update the spectrum according to the following 
rule, 

Cf =Cr"+a^(F-)„^, (20) 

where the step length, a, maximizes We 

implement the latter optimization with a standard 
line optimizer (linmin; Press et al. 2007). 

3. Convergence is defined when the log-likelihood has 
changed by less than 0.1 over the last three itera¬ 
tions 

The underlying intuition is simply to tune the step size 
along the proposed QML direction such that the likeli¬ 
hood is maximized. Each step will necessarily lead to 
a higher likelihood value, and the algorithm cannot di¬ 
verge. 

Unfortunately, this stability comes at a non-negligible 
computational cost, as one now has to perform a non¬ 
linear optimization within each main QML iteration, and 
this operation requires repeated likelihood evaluations. 
However, since each likelihood evaluation is quite fast due 
to the compression step described above (after all, the 
likelihood function is designed to be an active component 
in an MCMC cosmological parameter estimation frame¬ 
work), this is not a showstopper; the benefit of additional 
stability more than compensates for this expense. 

Before turning to applications, we make one note re¬ 
garding error estimation. Often, is adopted as 

an uncertainty on the QML estimate, a choice that is pri¬ 
marily driven by computational efficiency. In this paper, 
we quote asymmetric 68% confidence limits, computed 
by mapping out the likelihood conditionally around the 
maximum likelihood point for each parameter, and find¬ 
ing the smallest range that encompass 68% of the condi¬ 
tional likelihood volume. 

6. BASIS OPTIMIZATION 













22 24 26 28 30 32 24 26 28 30 32 
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Fig. 5.— Relative Fisher uncertainty increase as a function of 
truncation multipole for each of the four cosmologically interesting 
power spectra {Cj”^, Cj^, and C®®), evaluated for the 

signal-to-noise basis. 

We now turn our attention to basis set optimization, 
considering each of the five candidates defined in Sec¬ 
tion 4 as applied to the 9-year WMAP data described 
in Section 3. In our framework, basis optimization corre¬ 
sponds simply to determining the harmonic space trunca¬ 
tion multipole, £t, and eigenvalue thresholds, ex and ep, 
that result in the smallest number of accepted modes un¬ 
der the constraint that the information content over some 
range of multipoles is conserved. For the main analysis, 
we consider 2 < £ < 32 to be the multipole range of in¬ 
terest, matching that of the official WMAP temperature 
likelihood, and as a secondary test, we consider a case 
in which the polarization range is constrained to .^ < 10, 
directly targeting the low-£ EE reionization peak. 

The goal of our first test is to compare the efficiency 
of the five candidate bases. For this, we base our statis¬ 
tic on the Fisher information: For each combination of 
truncation multipole and eigenvalue thresholds, we com¬ 
pare Fisher uncertainty (i.e., for the proposed ba¬ 

sis with the corresponding value computed from the full 
pixel basis. We then require that the uncertainty must 
not increase by more than 10% for any multipole within 
the range of interest. Thus, this test is designed only to 
compare the relative compression efficiency of the vari¬ 
ous bases, not measure absolute information content, as 
correlations between multipoles are not properly quanti¬ 
fied. 

The results from these calculations are summarized in 
Figure 4, plotting the lowest number of accepted modes 
for each basis as a function of truncation multipole. 
First, we see that higher truncation multipoles gener¬ 
ally require more modes in the basis in order to produce 
stable low-£ results. This makes sense because many of 
the newly added high-£ modes have a higher eigenvalue 
than the some of the previous low-£ modes, and therefore 
more modes have to be included to retain the same low-£ 
information. Selecting modes based on harmonic con¬ 
tent rather than eigenvalue would circumvent this issue. 
Second, and this is the main point of the plot, we see 
that the four candidate bases behave quantitatively dif¬ 



Fig. 6.— Two-parameter amplitude-tilt likelihoods for various 
eigenmode cutoffs for the signal-to-noise basis (broken contours), 
compared to the corresponding likelihood evaluated from the pixel 
basis (solid contours). The contours indicate the peak and 1, 2 and 
3(7 confidence regions, respectively. 

ferently: While the inverse noise basis require more than 
4500 modes to produce robust results for high truncation 
multipoles, only 3500 modes are needed in the signal-to- 
noise basis, corresponding to a reduction of 22% in num¬ 
ber of modes or a theoretical speed-up of 2. The two 
other bases lie in between, and are fairly close to each 
other. All four candidate bases achieve substantial com¬ 
pression compared to the original pixel basis including 
a total of 6836 modes. In the following, we adopt the 
signal-to-noise basis as our default compression basis. 

In Figure 5 we plot the relative Fisher uncertainty in¬ 
crease as a function of truncation multipole for each of 
the four cosmologically interesting power spectra (Cj^, 
and for this basis. Decreasing 4 gradu¬ 

ally from 32 to 26, we observe two main effects. First, the 
most striking feature is that the uncertainties increase 
by almost an order of magnitude for any multipoles at 
(.> It- However, they do not become infinite, because of 
the non-orthogonality introduced by the mask. In other 
words, there is information about high multipoles in cut- 
sky harmonics (“pseudo-o^ms”). Conversely, the second 
effect is that the uncertainty also on multipoles below it 
increase when removing high-£ modes, gradually increas¬ 
ing the low-^ noise floor. 

From Figure 4 we know that no Fisher uncertainties 
between 2 < £ < 32 increase by more than 10% when 
including 2500 modes or more in the signal-to-noise ba¬ 
sis. However, this is a quite crude criterion, and not 
a sufficient criterion for establishing a proper produc¬ 
tion likelihood; for this we have to make sure that cor¬ 
relations are also properly accounted for. We there¬ 
fore define a more directly applicable statistic through a 
simple two-parameter amplitude-tilt model on the form 
Ce{q,n) = q {i/ipivot)^ Cf^, and map out the 2D {q,n) 
likelihood for each effective basis. The search is done in 
terms of number of modes, and only the signal-to-noise 
basis is subjected to this analysis. 

A subset of the results derived in this calculation is 
shown in Figure 6, in the form of 2D likelihood contours. 
Here we see that when including only 2473 modes, which 
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resulted in less than 10% increase in any single multi¬ 
pole error bar, the integrated uncertainties over the entire 
range leads to significant changes. However, the agree¬ 
ment rapidly improves when adding more modes, and 
with 3102 modes the agreement with the pixel basis is 
very good. To quantify this statement, we calculate the 
integrated absolute difference between the two distribu¬ 
tions, 

A = y \Ci—C2\dqdn, (21) 

and compare the resulting parameter with that computed 
from two bi-variate Gaussians with identical covariances 
but different means. Numerically, we find a value of 
A = 0.002 for the basis including 3102 modes, which 
corresponds to a shift of O.OOOcr for two bi-variate Gaus¬ 
sians. Recomputing the Fisher uncertainties with this 
new basis, we find that the relative error increase is now 
smaller than 3.8% for all multipoles. 

We emphasize that this particular number of modes 
is not to be taken as a universal prescription, and in 
general will depend on the signal-to-noise ratio of the 
data in question. However, we should note that the re¬ 
quired number of modes needed for convergence lie close 
to the number of modes that are left after truncating the 
multipole expansion for temperature and polarization at 
(. = 32, namely 3 ■ (32 -|- 1)^ = 3267, with the difference 
between this number and the required number of modes 
(3102) perhaps explained by the low signal-to-noise ratio 
of the polarization data. 

Figure 7 shows the eigenspectrum of the signal-to-noise 
eigenbasis once again, this time with the two proposed 
eigenmode cutoffs marked as vertical lines. To achieve 
reasonable accuracy on individual multipoles, it is suffi¬ 
cient to include only the high signal-to-noise modes in the 
flat high eigenvalue plateaus. However, to properly ac¬ 
count for correlations, it is important to also include the 
modes that lie in the rapidly dropping regime; these are 
partially degenerate modes that still carry some informa¬ 
tion. On the other hand, beyond this rapid decrease the 
remaining eigenvalues are for all practical purposes are 
zero, and can be excluded safely. Note that this region 
starts around (32 -|- 1)^ and 2 • (32 -|- 1)^ for tempera¬ 
ture and polarization, respectively, which make intuitive 
sense, given that these are the number of modes left after 
truncating the multipole expansion aX I = 32. Thus, fu¬ 
ture basis optimization can be performed quite simply by 
computing the the eigenspectrum of the signal-to-noise 
basis, and determining the cutoff at which the numeri¬ 
cally singular region begins. 

Before concluding this section, we show in Figure 8 
the CPU time per likelihood evaluation as a function of 
number of basis modes (top panel), as well as the cor¬ 
responding speed-up (bottom panel). Each point in this 
plot is computed as the average of 50 consecutive single- 
CPU evaluations. For the pixel basis that includes 6836 
modes, each likelihood evaluation requires 35 CPU sec¬ 
onds, while each signal-to-noise basis evaluation (includ¬ 
ing 3102 modes) requires 7.5 CPU seconds. The realized 
speed-up is thus a factor of 5, which, although significant, 
is lower than the theoretical limit of (35/7.5)^ = 11 by 
a factor of two. On the other hand, all expensive oper¬ 
ations are implemented using standard Lapack routines, 
which are already highly optimized, and fully gaining this 




Fig. 7.— Normalized temperature (top) and polarization (bot¬ 
tom) eigenvalue spectrum for the signal-to-noise basis with it = 32. 
The vertical lines indicate the cutoffs determined by the Fisher 
uncertainty (dashed) and amplitude—tilt (dotted) analyses. For 
robust results, all modes below the sharp decreases should be in¬ 
cluded. 



Number of basis modes 


Fig. 8 . — CPU time per likelihood evaluation as a function of 
number of basis modes (top), and corresponding speed-up relative 
to pixel basis evaluation (bottom). The average was calculated from 
50 evaluations running on a single CPU. Vertical lines indicate the 
timing estimates for the two bases found in the text. 

factor is not trivial. 

7. POWER SPECTRUM, LIKELIHOOD AND PARAMETERS 



















Multipole moment, / 

Fig. 9.— 9-year WMAP QML power spectrum estimates as 
a function of QML iteration (from blue to red). Error bars are 
asymmetric 68% confidence limits computed from the conditional 
likelihood evaluated around the maximum-likelihood point for the 
last iteration. 

In this section we assess the performance of the com¬ 
pressed likelihood formalism in terms of the CMB power 
spectrum, likelihood and cosmological parameters. Fig¬ 
ure 9 shows the low-.^ WMAP temperature and polariza¬ 
tion power spectrum as derived with the QML estima¬ 
tor described in Section 5, using the signal-to-noise basis 
that contains 3102 basis modes from the last section. 
Different colors show the result after different number 
of QML iterations going from few (blue) to many (red). 
The error bars indicate the asymmetric 68% errors for 
the last iteration. Figure 10 shows the corresponding 
log-likelihood as a function of iteration. 

Several interesting features may be seen in these plots. 
First of all, the initial guess adopted for this calcula¬ 
tion was the best-ht Planck 2013 model, indicated as 
dashed lines in Figure 9, while formal convergence was 
achieved after 12 iterations. However, we see that al- 



Fig. 10.— Log-likelihood of the QML power spectrum estimate 
as a function of QML iteration. 

ready a single QML iteration results in a solution that 
for most multipoles is quite close to the actual maximum- 
likelihood solution. For exploratory work, for instance 
when trying to understand the effect of systematics on 
low-£ polarization studies, a single-iteration QML power 
spectrum approximation may be quite useful, providing 
a near optimal power spectrum estimate in less than 2 
minutes of CPU time. However, for final analysis it is 
clear that several iterations are indeed highly desirable, 
as the log-likelihood increases by more than Aln£ = 5.5 
and = —2Aln£ by more than 11. As a concrete 
example, the temperature quadrupole converges slowly 
because of its intrinsically non-Gaussian shape, and re¬ 
quires at least 8 iterations before stabilizing. 

In Figure 11 we compare three different power spec¬ 
trum estimates, all computed by maximum-likelihood 
techniques using the 9-year WMAP data, but with differ¬ 
ent underlying likelihoods. Green points are derived di¬ 
rectly from the official WMAP low-£ likelihood through a 
non-linear multivariate Powell search (Press et al. 2007); 
blue points are derived from the pixel basis likelihood 
described in this paper using the iterative QML estima¬ 
tor; and red points are derived from the corresponding 
signal-to-noise basis that includes 3102 modes. Figure 
12 compare individual conditional likelihood slices com¬ 
puted from the WMAP likelihood and the signal-to-noise 
basis. 

Overall, the agreement between the three spectra is 
very good, and for most multipoles the relative shifts are 
much less than Icr. However, there are notable differences 
as well, and perhaps the most striking is the different be¬ 
haviour of the error estimates associated with the Cj^ 
spectrum. Considering first the WMAP spectrum in Fig¬ 
ure 11, a total of 16 out of 31 power spectrum coefficients 
have a vanishing error bar either toward low of high val¬ 
ues, indicating a maximum likelihood point that lies on a 
sharp likelihood boundary. For comparison, for the pixel 
and signal-to-noise basis likelihoods only 4 and 0 out of 
31 coefficients show similar behaviour, respectively. This 
difference is primarily due to the different effective priors 
imposed by the two approaches; while the WMAP like¬ 
lihood only requires the data covariance to be positive 
definite, we impose the stronger criterion that the condi¬ 
tion number must also be well behaved (see Section 4.3). 
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Fig. 11. — Comparison of maximum-likelihood power spectra 
derived from three different WMAP \ow-l likelihood implementa¬ 
tions. Error bars indicate asymmetric 68% confidence regions com¬ 
puted from conditional likelihood slices around the joint maximum- 
likelihood point. 

The latter prior prevents the nonlinear search algorithm 
from finding non-physical power spectrum solutions near 
the singularity boundary with artificially high likelihood 
values, which in turn forces correlated power spectrum 
coefficients away from their best-fit values. 

We also note that the pixel-based likelihood error bars 
are, in a sense, less symmetric than those of the signal- 
to-noise basis. This is most likely a cause of the fact that 
the pixel-based likelihood is more ill-conditioned due to 
the higher number of redundant modes. This degeneracy 
makes the likelihood slices behave worse for the pixel- 
based case than for the signal-to-noise basis. 

A different but related issue is seen in the plot of 
C{C§^) in Figure 12. Here one can see that the WMAP 
likelihood allows significantly negative values of 
but not values very close to zero; there is a “hole” in the 
likelihood surface. This is an artifact of the temperature- 
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Fig. 12. — Comparison of conditional likelihood slices computed 
from the official WMAP likelihood (solid lines) and the signal- 
to-noise basis defined in the current paper (dashed lines). Other 
multipoles are fixed at the best-fit Planck 2013 ACDM power spec¬ 
trum. 



Fig. 13. — Cosmological parameter distributions evaluated with 
CosmoMC for three different likelihoods; the official WMAP likeli¬ 
hood (black), the signal-to-noise basis derived in the current paper, 
truncated at £ = 32 (red), the same but truncated £ = 10 for polar¬ 
ization (blue), and the signal-to-noise basis using a reversed power 
spectrum (green) 


polarization split implemented by the WMAP likelihood, 
in that positive definiteness is assessed separately for the 
temperature and polarization components, effectively re¬ 
sulting in three independent criteria (i.e., Cj^ > 0 for 
the Blackwell-Rao temperature component, |S-kN| > 0 

for the polarization component; and Cj^ < -\J 

for the hybrid likelihood). With single joint likelihood 
implemented in this paper, this problem becomes much 
simpler, in that there is only a single (condition number 
based) numerical prior, and an optional physical prior. 
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< y ^ whose valid parameter volume lies 

fully within the numerical prior region. 

Our final test relates to cosmological parameters, as es¬ 
timated using CosmoMC (Lewis & Bridle 2002) coupled 
to different versions of the 9-year WMAP likelihood. All 
cases used the same high-£ likelihood, including Cf^ for 
33 < £ < 1200 and Cj^ for 33 < £ < 800, while at low 
€s four different variations were considered: 

1. The standard WMAP \aw-l hybrid temperature- 
polarization likelihood 

2. The signal-to-noise basis likelihood derived in Sec¬ 
tion 6 including 3102 modes. 

3. A similar signal-to-noise basis likelihood as (2), but 
with polarization truncated at £ = 10. 

4. The same as (2), but with the fiducial power spec¬ 

trum used for basis definition extracted from an 
incorrect part of the original spectrum, specifically 
Df, ^ Hiooo-^ with Df, = + l)/27r. 

The latter is a simple test of potential sensitivity to the 
assumed fiducial spectrum. 

The results from these calculations are summarized in 
Figure 13 and Table 1. First and foremost, we see that 
all results are highly robust against these variations, with 
a maximum change of any marginal mean of at most 
0.2cr. Of course, most of these parameters are domi¬ 
nated by small-scale information, but also the optical 
depth of reionization, r, which depends critically on the 
low-£ EE spectrum shows very small variations. Even fil¬ 
tering away all polarization multipoles above i only 
affects the results by O.lScr. Finally, reversing the power 
spectrum does not make any difference whatsoever, with 
results that are identical to the default case up to the 
second digit in the uncertainties. 

8. SUMMARY 

Building on an idea proposed by Tegmark et al. (1997), 
we have developed a framework for efficient low-^ CMB 
polarization likelihood analysis using linear compression, 
and we have applied this framework to the 9-year WMAP 
data. Five different basis definitions were compared in 
terms of compression efficiency, and, in agreement with 
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earlier suggestions, we find that an optimal basis may 
be defined in terms of the eigenvectors of 
picking out modes with high signal-to-noise. Within this 
basis, the original low-^ WMAP data set comprising 6834 
pixels may be compressed onto a smaller set of 3102 ba¬ 
sis vectors with negligible loss of accuracy, reducing the 
computational cost of a single likelihood evaluation by a 
factor of five. 

Next, we have used the same framework to implement 
an efficient and stable version of the Quadratic Maximum 
Likelihood power spectrum estimator, slightly re-writing 
the expressions for the covariance matrix derivatives to 
use explicit projection operator. The corresponding code 
requires about 3 GB of memory and 2 CPU minutes per 
QML iteration for the WMAP data at a HEALPix res¬ 
olution of Nside = 16, which is well within the capabili¬ 
ties of a standard laptop. Additionally, we have shown 
how to stabilize the QML estimator, and avoid regions 
of parameter space in which the data covariance matrix 
become non-positive definite. This increases the over¬ 
all computational cost by a small factor, as it relies on a 
non-linear optimization within each main QML iteration, 
but for most cases this is not a major problem. 

On a related topic, we have introduced a new and more 
effective prior for removing nonphysical artifacts on the 
likelihood surface. Previously, this was done by only re¬ 
quiring the data covariance matrix to be positive definite, 
but this leaves significant anomalies near the singularity 
boundary. A better option is to put a constraint on the 
condition number of the covariance matrix. 

Finally, we note that while only the WMAP likelihood 
was considered in this paper, we expect that the methods 
presented here should be directly applicable to the up¬ 
coming Planck polarization data. 
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TABLE 1 

Summary of cosmological parameters 



Default WMAP 
Constraint 

P TRUNC 
Constraint 

- 32 

Deviation (cj) 

P TRUNC 
Constraint 

- 10 

Deviation (c) 

Reversed power spectrum 
Constraint Deviation (c) 


0.0226 ± 0.0005 

0.0227 ±0.0005 

0.12 

0.0227 ± 0.0005 

0.09 

0.0227 ± 0.0005 

0.12 


0.114 ±0.005 

0.114 ±0.005 

0.11 

0.114 ±0.005 

0.09 

0.114 ±0.005 

0.13 

9 

1.040 ± 0.002 

1.040 ± 0.002 

0.06 

1.040 ± 0.002 

0.04 

1.040 ± 0.002 

0.05 

T 

0.088 ±0.014 

0.089 ± 0.014 

0.04 

0.086 ±0.014 

0.18 

0.089 ±0.014 

0.02 

Us 

0.974 ±0.013 

0.976 ± 0.014 

0.14 

0.976 ±0.013 

0.11 

0.976 ±0.013 

0.15 

log[10i°As] 

3.10 ±0.03 

3.10 ±0.03 

0.001 

3.09 ± 0.03 

0.20 

3.10 ±0.03 

0.03 

Ho 

69.4 ±2.17 

69.7 ±2.22 

0.13 

69.7 ±2.21 

0.11 

69.7 ±2.21 

0.14 


Note. — Parameters derived with four different low-f WMAP likelihood implementations; the official WMAP low-£ likelihood, and 
the signal-to-noise basis likelihoods derived in this paper, truncated at f = 32 and 10, respectively, for polarization, and one truncated at 
£ = 32 with a reversed power spectrum as the basis signal. The fourth, sixth, and eighth columns show the relative shifts with respect to 
the WMAP approach measured in units of a. The confidence intervals are Icr, and the best-fit points are marginal posterior means. 
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